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Abstract 

Background: Reverse engineering in systems biology entails inference of gene regulatory networks from 
observational data. This data typically include gene expression measurements of wild type and mutant cells in 
response to a given stimulus. It has been shown that when more than one type of experiment is used in the 
network inference process the accuracy is higher. Therefore the development of generally applicable and effective 
methodologies that embed multiple sources of information in a single computational framework is a worthwhile 
objective. 

Results: This paper presents a new method for network inference, which uses multi-objective optimisation (MOO) 
to integrate multiple inference methods and experiments. We illustrate the potential of the methodology by 
combining ODE and correlation-based network inference procedures as well as time course and gene inactivation 
experiments. Here we show that our methodology is effective for a wide spectrum of data sets and method 
integration strategies. 

Conclusions: The approach we present in this paper is flexible and can be used in any scenario that benefits from 
integration of multiple sources of information and modelling procedures in the inference process. Moreover, the 
application of this method to two case studies representative of bacteria and vertebrate systems has shown 
potential in identifying key regulators of important biological processes. 



Background 

In the last ten years the development of functional 
genomics technologies has provided us with the ability 
to generate quantitative data representing the molecular 
state of cells and tissues at a genome level [1,2]. These 
datasets can be in the form of a time series representing 
the dynamics of gene expression profiles (e.g. mRNA, 
proteins and metabolites) in response to a given stimu- 
lus, such as an environmental perturbation, the effect of 
a growth factor or an experimentally induced gene dele- 
tion. Despite the relatively large amount of information, 
predicting underlying regulatory networks from observa- 
tional data is still not trivial and is a matter of intense 
research [3]. 
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A number of reverse-engineering approaches have 
been proposed. Some of these are designed to infer net- 
works from a compendium of perturbation experiments 
while others are able to use time course data to develop 
dynamical models of gene interaction. Bayesian net- 
works have been among the first to be applied to biolo- 
gical problems [4]. They work by inferring probabilistic 
relationships between variables, can use either time 
course or steady state data and allow integration of 
prior knowledge in the model. Correlation-based meth- 
ods [5,6] compute correlation coefficients between vari- 
ables to infer the underlying network topology. State- 
space models (SSMs) [7,8], and ODE-based methods 
[9,10], on the other hand use time-course data to 
develop dynamic models of gene regulatory networks 
(GRN). For an extensive overview of these methodolo- 
gies see: [11,12]. 
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The general validity of the principal of integrating 
multiple data sources in the reverse-engineering process 
is exemplified by the observation that the best perform- 
ing methods utilize some degree of integration between 
different experiments [13]. For example, the top per- 
forming method in the third edition of the "Dialogue for 
Reverse Engineering Assessments and Methods" 
(DREAM), developed by Yip et al. [14], was based on a 
combination of a statistical error-model and ODE mod- 
eling to integrate gene knock-out (KO) and time-course 
experiments. Interestingly, Yip et al. [14] also noted that 
a relatively simple differential gene-expression analysis, 
comparing wild-type and mutant strains, was in itself a 
very good representation of the underlying gene regula- 
tory network. However, not all KO experiments are 
likely to be equally informative and identifying a priori 
the most relevant genes is not a trivial task. Moreover, 
large-scale gene-inactivation experiments are not a 
viable option for many non-model species. 

Therefore, there is the need to expand the repertoire 
of available network inference tools by developing more 
methods that allow integration of multiple data sources 
and have the flexibility to use a wide range of datasets 
and information. In order to achieve this objective, we 
set out to develop a computational framework that has 
the potential to combine different inference methodolo- 
gies, multiple datasets, as well as any pre-existing biolo- 
gical knowledge. We based this approach on an ODE 
framework combined to a multi-objective optimization 
(MOO) procedure for parameter estimation. We named 
this method "Network-Inference with Multi Objective 
Optimization' (NIMOO). 

Methods 

The basic network inference framework: Model Equations 
and parameter estimation of a single objective 
optimization procedure 

Gene interactions in a regulatory network can be modelled 
using a set of ordinary differential equations [9,10]. In this 
implementation we have used a linear ODE model where 
the interaction between genes is additive. In this context, 
changes in the expression of a given gene depend on a 
weighted linear sum of the expression of its regulators: 

N 

Xi = ^2 ^ij^i + ^i^i (1) 

i=i 

where, Xi represents expression level for gene /, bi 
represents the effect of the external perturbation on 
gene i, and, N is the number of genes in the dataset. 
The parameter matrix w is obtained by minimizing the 
Squared Error (£^^^) 



^SQE = J2Y, (x—'^ - (2) 

1=1 t 

The gene regulatory network (GRN) is then inferred 
from the optimized parameter matrix w. The matrix ele- 
ment I Wij I indicates the strength of the interaction 
between genes / and ; (with gene ; regulating gene /), 
and, sign (wij) indicates whether the effect is excitatory 
(Wij > 0) or inhibitory {w^ < 0) 

In our implementation of single objective optimisation 
(SOO), minimisation of E^^^ was achieved using the 
trust-region method based on the interior-reflective 
Newton method [15,16]. In this method the minimisa- 
tion process involves defining a trust region where the 
objective function SQE can be approximated with a sim- 
pler function q. For successive iterations, function q, in 
conjunction with the Preconditioned Conjugate Gradi- 
ent Method [16], is used to find a new trust region 
where the function SQE is lower. The process is termi- 
nated when the change in function value is less than a 
pre-determined tolerance (10'^). 

Parameter estimation using a multi-objective 
optimization procedure 

Multi-objective optimisation (MOO) is based on mini- 
misation of E^^^ in conjunction to additional objective 
functions, E^^^^^\ which are built as Euclidean distance 
between the parameter matrix w and objectives O con- 
structed from additional data and/or existing knowledge: 

^^''' = T^io,-^.if- (3) 

To implement multi-objective optimization we have 
used the goal attainment method [17,18]. In this method 
the problem of simultaneously optimizing multiple func- 
tions is reduced to the task of standard minimization. A 
set of goah [Ji, J 2, —J ml and weights [61, 62, Oy^] are 
assigned to the objective functions F = [Fj, F2, F^], 
where, Fj = E^^^, F2 = etc. Also, a scalar dummy 

variable y is introduced so that the aim is to minimize 
for y such that 

Fk[w) - OkY < Jk, k = hm. (4) 

The term O/^y introduces flexibility in the degree of 
goals attained. Also, the weight factor 9^ can be used 
to assign relative importance to the objectives: Thus, 
from Equation 4, 9/^ = 0 implies hard goal for the cor- 
responding objective function F/^, For all results pre- 
sented in this paper, unless mentioned otherwise, the 
goal and weight corresponding to the objective SQE 
were set to 0. 
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Overall strategy for the development of a MOO-based 
inference method 

Figure 1 shows in a schematic format the different pro- 
cedures that are part of the NIMOO framework and 
their relationships with the experimental datasets. 

The principle behind NIMOO, as detailed in the 
above sections, is to infer the gene regulatory matrix 
(GRM) w by minimizing the E^^^ for the ODE system 
in conjunction with additional objective functions, E"^' 
which represent the distance between the parameter 
matrix Wij and objectives Oij constructed from additional 
information. 

In principal, objectives Oij can be constructed from 
any data available on the underlying regulatory network. 
In this paper, we focused on two possible scenarios. 



In the first case, we considered the possibility that 
MOO might be used to integrate two different network- 
inference procedures, for example applied to indepen- 
dent replicates of a time-course experiment. In this 
implementation we used time-delayed Spearman rank- 
correlation [6] to develop a matrix Oij (Equation 3) 
representing the degree of statistical correlation from 
any pair of genes within a set time delay interval (Figure 
1, objective DSp). 

Alternatively, Oij can be built from the results of gene 
inactivation experiments. We reasoned that these 
experiments might fall in at least two categories. In the 
first case the gene is deleted at some stage of the life- 
cycle of the organism so that gene expression measure- 
ments can only be acquired after the new steady state 
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Figure 1 Overview of the NIMOO methodology. The figure shows in a schematic format the relationships between type of experiment, 
methods used to build the Oy objectives and the MOO procedures. Details are given in the Methods section. 
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has been reached. This can be easily achieved, by a 
plethora of gene knockouts (KOs) methodologies in 
model systems ranging from E. coli to mice. Alterna- 
tively, gene inactivation could be achieved by using bio- 
chemical inhibitors or RNA interference. In this 
scenario mRNA expression can be monitored at differ- 
ent time points following intervention. In the context of 
MOO both scenarios lead to a gene-expression matrices 
where rows are represented by genes and columns by 
gene KO experiments. From each of these matrices an 
objective Oij can be computed to represent the relation- 
ship between every gene pair (Figure 1, objectives Tc, 
Tr are derived from expression data at a given time- 
point shortly after gene inactivation whereas objectives 
Sc and Sr derive from expression data at a single time 
point at steady state; for further details on how to com- 
pute Oij see sections below). 

Different procedures may be used in combination 
using an ensemble approach; in this paper we describe 
the results of combining MOO-Tr with MOO-Tc (Fig- 
ure 1, MOO-Tens) and MOO-Sr with MOO-Sc (Figure 
1, MOO-Sens). 

All MOO procedures developed within NIMOO have 
been compared with the accuracy of an ODE model 
developed by minimizing E^^^, a procedure that we 
called single-objective optimization (Figure 1, SOO). 

The paragraphs below describe in detail how the dif- 
ferent objectives were computed. 

Construction of a time-delay correlation matrix (objective 
DSp) 

To test the potential of MOO to combine different net- 
work inference approaches we choose to build an objec- 
tive based on time-delayed Spearman Rank-correlation 
[6] (Figure 1, objective DSp). DSp was computed as fol- 
lows: For each gene pair (/,;), the expression profile of 
gene / is shifted along the time axis with respect to that 
of gene The Spearman Rank-correlation coefficient is 
calculated for varying time delays and the largest coeffi- 
cient from this list forms the {ijy^ element of the 
delayed Spearman Rank-correlation matrix d-SRC, We 
also construct a time delay matrix dt from the corre- 
sponding values. The objective DSp is then obtained 
from d-SRC by equating all d-SRC(iJ) = 0 for which dt 
(ij) < to, so that only gene pairs with delay of to or 
more are considered. 

Construction of a gene KO matrix: a ratios-based 
procedure 

The objectives Tr and Sr (Figure 1) were constructed by 
computing the ratios between the expressions of each 
gene / in the mutant j and the expression of gene / in 
the wild type. The expression of gene / is taken either at 
a given time point tp after inactivation (Tr) or at the 



steady state (Sr). We selected tp as the time point where 
the largest numbers of genes have the highest derivative 
in absolute value. We found that this heuristic rule 
allowed us to identify a value of tp, which often (8 out 
of 9 networks tested) corresponded to the highest AUG 
values within a tolerance of 5% (Figure SI). 

Construction of a gene KO matrix: a correlation-based 
procedure 

The objectives Tc and Sc (Figure 1) were computed by 
calculating the correlation between the expressions of 
every pair of genes (gene gene ;) across all KO sam- 
ples. Similarly to the ratio procedure, the Tc matrix was 
built using the measure of gene i expression at time tp, 
where tp was chosen as detailed above. 

Combining MOO procedures using an ensemble approach 

The ratio and correlation methods were integrated to 
produce a single model by using an ensemble approach. 
Within this procedure, a GRM Wa was constructed so 
that |wa(i,j)| = |wr (i,j)| and sign(wa(i,j)) = sign(wc (i,j); 
Where Wr and Wc represent two GRMs obtained from 
the ratio and correlation procedures, respectively. As 
exemplified in Figure 1, MOO-Sens represent the result 
of combining the MOO-Sr and MOO-Sc procedures 
whereas MOO-Tens> is the result of combining the 
MOO-Tr and MOO-Tc procedures. 

Simulated data 

The validation study has been performed using the java 
application GeneNetWeaver (GNW) http://gnw.source- 
forge.net[19]. This network generator has been used as 
part of the DREAM Initiative [20]. It builds synthetic 
networks by specifying a biologically relevant topology 
and implementing an ODE model to generate synthetic 
data. GNW grows the initial topology from a seed 
node (selected randomly) in a Source Gene Network {E. 
Coli in this application) by progressively adding a ran- 
domly selected neighbouring node till the desired size 
is reached. Each model can be used to generate simu- 
lated time course gene expression data either with the 
intact network or following deletion of one of the 
nodes. 

We tested the performance of MOO in conjunction 
with the objectives D-Sp (MOO-Sp), Tc (MOO-Tc), Tr 
(MOO-Tr), Sc (MOO-Sc) and Sr (MOO-Sr). Each of 
these procedures was applied to ten independent net- 
works of size 20, 35 and 50 genes. The gene KOs data- 
set associated with every network was build by 
generating synthetic data after the stepwise deletion of 
each gene in the network. 

All GNW-generated network-models were used to 
simulate time-series datasets (26 time points, t_max = 
200) as well as steady-state data for all KOs. 



Gupta et al. BMC Systems Biology 201 1, 5:52 
http://www.biomedcentral.eom/1752-0509/5/52 



Page 5 of 14 



Data processing and optimisation procedure 

Noise was added to the simulated data (5% of the signal) 
to mimic variability in experimental replication. Polyno- 
mial fitting was used for interpolation of the time-series 
data [21] after averaging three experimental replicates. 
200 interpolated, equally spaced time-points were then 
computed and used as input of the MOO procedures. 
Optimisation of the matrix w was initiated from a ran- 
domly generated matrix. In order to test the reproduci- 
bility of the optimization methods, fifty independent 
runs of optimization from each MOO procedure were 
carried out for a subset of the GNW networks. We 
found that the AUCs values were always within 0.2%. 

Network inference accuracy 

In order to evaluate various MOO methods we com- 
pared the inferred gene-regulatory matrix w with the 
true network topologies. The accuracy of the inference 
process for undirected networks was quantified by using 
the area under curve (AUG) of a ROC plot (False Posi- 
tive Rate (FPR) versus True Positive Rate (TPA)). For 
direct-signed networks the AUG was computed by plot- 
ting TNR (True Negative Rate) versus TPR as described 
in [10]. The distribution of AUG values for boxplots and 
these represented each batch of networks were com- 
pared when appropriate using a Wilcoxon's non-para- 
metric rank sum test [22]. 

Modelling in vivo tumour development 

In order to assess the potential of NIMOO to model 
true biological systems we have used two microarray 
datasets generated in our laboratory. 

We first used an in vivo model of glioblastoma devel- 
opment to test the MOO-Sp procedure. In this experi- 
mental model [23] U87 human glioma cells (ATGG, 
USA) were maintained in DMEM with 10% FBS, anti- 
biotics, and 1-glutamine. Fertilized chicken eggs {G alius 
gallus; E.A.R.L. Morizeau, Dangers, France) were incu- 
bated at 37°G and 80% humidified atmosphere. On day 
4 of development, a window was made in the eggshell 
after punctuating the air chamber and sealed with Dura- 
pore tape. On embryonic day 10, a plastic ring was 
placed on the embryo chorioallantoic membrane 
(GAM), and 3 million to 5 million U87 cells in 20 [d of 
medium were deposited after gentle laceration of sur- 
face. Implantation experiments were performed in tripli- 
cate, and, from day 11 to day 15, mRNA from the 
growing tumour was extracted every 12 hours using the 
standard protocol provided in the Qiagen RNeasy kit. 
LabelUng was performed using protocol V5.7 provided 
in Agilent's Quick Amp One-Golour labelling kit and 
hybridized onto human Agilent microarrays (AMA- 
DID:014850). Data were normalized using quantile nor- 
malization and genes differentially expressed over time 



were identified using the statistical methodology SAM 
[24]. 58 genes were selected among the most differen- 
tially expressed across the time course (Table S3) and 
used as input of the modelling procedure. 

Modelling E coli acid stress 

In order to fully test the potential of MOO methodology 
we have applied the MOO-Sgns procedure to model the 
E. coli response to mild acid conditions, a stress signal 
relevant to pathogenesis in diarrheagenic E. coli strains 
[25]. The procedure was used to integrate two microar- 
ray datasets representing the dynamic response of the E, 
coli MG1655 strain to acid exposure (pH = 5.5) and a 
gene KO experiment performed in the related strain E. 
coli BW 25113, representing the transcriptional state of 
strains mutated in the 26 most differentially expressed 
genes. In this analysis we aim to reverse engineer the 
interactions between these 26 genes. The time-course 
analysis of the response of the E, coli strain MG1655 to 
acid exposure was performed maintaining a constant 
cell number (ODeoo nm = 2) using a media replenish- 
ment procedure. Samples were collected every 5 minutes 
for 1 hour in three replicated experiments. Mutant 
strains representing 26 of the most differentially regu- 
lated genes over time were selected from the BW25113 
KEIO mutant collection [26] and analysed using micro- 
arrays as described below. Experiments were performed 
exactly in the same conditions as the MG1655 strain 
but only control and 15 minutes in acid were processed 
for microarray analysis. 

Microarray analysis was performed as follows. 10 ml 
of cultures were samples at the different time points 
and stabilized by adding a solution of phenol-ethanol 
(final concentration of 19% phenol and 1% ethanol). Gell 
pellets were recovered by centrifugation and stored at 
-80°G. mRNA was extracted using the standard protocol 
provided in the Quiagen RNEasy kit (QUIAGEN, USA) 
and labelled with Gy5 labelled dGTP (Amersham Bios- 
ciences, USA) using the GyScribe Post-Labelling Kit 
(Amersham Biosciences, USA). Probes were then puri- 
fied using GyScribe purification Kit (Amersham Bios- 
ciences, USA) according to the manufacturer's 
instructions. Labelled probes (80 pmol) were then hybri- 
dized on Operon E, coli Ultra GAPS microarray sUdes 
(Gorning, USA). Slides were washed in AdvaWash auto- 
mated washing station (Adavlytix, USA) and scanned 
with the ScanArray® GX (PerkinElmer®, USA), usm2 
the ScanArray® software. Data were normalized using 
quantile normalization and genes differentially expressed 
over time were identified using the statistical methodol- 
ogy SAM [25]. We modelled the E. coli datasets by 
using the ensemble approach integrating both correla- 
tion and ratio procedures as described above. In order 
to generate comparable sparse networks we thresholded 
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the connectivity matrix w to obtain networks with same 
number of genes in the networks (25). 

Method implementation and datasets availability 

NIMOO was implemented in MATLAB [27]. Code and 
datasets are available at this URL: http://biptemp.bham. 
ac.uk/NI_MOO/NI_MOO.zip. 

Results 

Combining different inference methodologies within the 
MOO framework improves the accuracy of network 
reconstruction 

The first scenario we considered involved combining 
two network inference methods to model replicated 



time course experiments. To achieve this, we used 
delayed Spearman Rank-correlation [6] to build the 
objective Oij (Equation 3) for MOO. 

We discovered that the simple time-delayed correla- 
tion matrix DSp (Figure 1) was more effective than 
SOO to reverse engineer undirected networks of size 20 
and 35 (up to 10% increase, p < 0.05) (Figure 2A, C and 
Table 1). The MOO-DSp procedure was always more 
effective than SOO (up to 11% increase, p < 10'^) (Fig- 
ure 2 A, C, D and Table 1) and gave higher AUG values 
than the simple DSp matrix for networks of size 50 (8% 
increase, p < 0.05) (Figure 2D and Table 1). With direc- 
ted-signed networks the d-SP matrix was more effective 
than SOO although p values were borderline except for 
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Table 1 Accuracy of GRN inference with MOO-dSp 



Type Size SOO D Sp MOO dSp 



Undirected 


20 


0.68 


0.77 


0.79 


Undirected 


35 


0.68 


0.73 


0.79 


Undirected 


50 


0.65 


0.68 


0.76 


Directed-signed 


20 


0.27 


0.32 


0.23 


Directed-signed 


35 


0.27 


0.30 


0.22 


Directed-signed 


50 


0.24 


0.30 


0.31 



The table shows the average AUC values obtained for 20, 35 and 50-gene 
networks, for undirected and directed-signed networks, with the SOO, D-Sp 
and MOO-dSp procedures. 



the larger 50 genes network size (7% increase, p < 0.01) 
(Figure 2B, D, F and Table 1), 

Overall, we can conclude that in the event that only 
replicated time-course experiments are available, a situa- 
tion which is not uncommon, the integration between 
two methodologies can lead to a dynamical model with 
better accuracy than one solely based on a SOO 
procedure. 

Integrating time-course and gene inactivation 
experiments within the MOO framework: A ratio-based 
procedure 

Having shown that MOO is an effective approach to 
combine different network-inference methodologies we 
set to test whether it may also provide a solution to 
integrate time-course and gene-inactivation experiments. 

We initially approached this challenge by applying the 
MOO-Sr and MOO-Tr procedures to simulated data, 
representing gene expression in KO experiments either 
at the steady state or at a single time point tp after inac- 
tivation. We discovered that AUC could vary consider- 
ably (up to 25%) depending on the value of tp (Figure 
SI in Additional File 1), suggesting that the choice of 
the right time-point was an important factor. We also 
observed that the time point at which the largest num- 
ber of gene expression profiles had the highest deriva- 
tive often lead to higher AUC values within a tolerance 
of 5% (Figure SI in Additional File 1). Although this has 
not to be considered a criterion to identify the optimal 
tp value we believe it represents a useful guideline. 
MOO substantially improved the prediction of undir- 
ected networks, with all network sizes tested. The lar- 
gest gain we observed was a 20% increase respect to 
SOO with 35-gene networks, with the MOO-Sr proce- 
dure {p < 10'^) (Figures 3 A, C, E, Table 1 and Table 2). 
Overall, the MOO-Sr procedure also gave consistently 
higher AUC values than MOO-Tr although p values 
were borderline significant {p value = 0.16). Combining 
T r with S-r in the MOO procedure (MOO-(Tr-hSr)) 
did not further improve the accuracy of network infer- 
ence (Figures 3A, C and 3E and Table 2). For direct- 
signed networks, only MOO-Tr gave consistent higher 



AUC values respect to SOO although p values were bor- 
derline significant {p value = 0.12) (Figures 3B, D, F and 
Table 2). 

Integrating time course and gene inactivation 
experiments within the MOO framework: A correlation- 
based procedure 

In this section, we describe the results of the correla- 
tion-based procedure to construct MOO objectives from 
mutant gene expression data. As detailed in the methods 
section, this approach works by computing the correla- 
tion between the expression profiles of every pair of 
genes across the mutant samples. 

We discovered that inference accuracy of the ratio 
and correlation methods had opposite trends with 
respect to undirected and directed-signed networks. 
More precisely, the correlation-based objectives gave 
higher AUC values for direct-signed networks and 
lower AUC values than the ratio method for undir- 
ected networks. The differential in AUC values 
between the two methods was statistically significant 
for both undirected and directed-signed networks (up 
to 12% with undirected networks and up to 37% with 
directed-signed networks, p < 0.05 and p < 10'^ 
respectively) (Figure 4 and Table 2) Interestingly, the 
largest differential corresponded to the directed-signed 
larger 50-gene networks (37%, p < 10'^). 

We discovered that the method of correlation is effi- 
cient even when a partial dataset is available. Figure 5 
shows the results of the analysis for a 50-gene network 
when KO data is available for 50% of the genes. We did 
not observe any increase in inference accuracy for 
undirected networks with the MOO-Sc and MOO-Tc 
procedures (Figure 5 A and Table 3). However, a consid- 
erable increase in accuracy was detected when inferring 
directed-signed networks (Figure 5A and Table 3, up to 
27% improvement versus a SOO approach, p < 10'^). 

Combining ratio and correlation-based procedures further 
improve inference acuracy 

Since we have shown that correlation and ratio-based 
methods provide complementary information, we 
decided to test whether combining them using an 
ensemble approach could result in an even higher accu- 
racy of the network inference process. 

This approach was successful. AUC values for the 
ensemble models built from combining the MOO-Tr 
and MOO-Tc approaches (MOO-Tgns) were comparable 
to the best performing MOOTc models (Figure 6A, C 
and 6E and Table 2) whereas models built from combin- 
ing MOO-Sr and MOO-Sc (MOOSens) yield even higher 
AUC values than MOO-Sc models for the larger 35 and 
50-gene networks (15% and 10% increased AUC values, 
p < 0.05) (Figure 6D and 6F and Table 2). 
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Figure 3 Distribution of AUC values for ratio-based inference procedures. Boxplots representing the distribution of AUC values for 20, 35 
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Table 2 Accuracy of GRN inference by integrating gene KO datasets in the MOO frameworic 



Type 


Size 


Ratio methods MOO-Sr/MOO-Tr 


Corr. methods MOO-Sc/MOO-Tc 


Ensemble MOO Sens/MOO Tens 


Undirected 


20 


0.77/0.70 


0.70/0.65 


0.77/0.70 


Undirected 


35 


0.88/0.79 


0.75/0.70 


0.88/0.79 


Undirected 


50 


0.85/0.79 


0.75/0.69 


0.85/0.79 


Directed-signed 


20 


0.23/0.24 


0.42/0.36 


0.47/0.39 


Directed-signed 


35 


0.18/0.21 


0.54/0.49 


0.69/0.57 


Directed-signed 


50 


0.17/0.22 


0.54/0.45 


0.64/0.51 



The table shows the AUC values obtained for 20, 35 and 50-gene networks, for undirected and direct-signed networks with MOO procedures integrating time 
course and gene KO datasets. Ratio, correlation and ensemble-based methods are shown in separate columns. AUC values for different procedures within each 
column are separated by a forward slash. Note that we marked the AUC values in bold to highlight the opposite trend in inference accuracy of the ratio and 
correlation procedures. 
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Overall, MOO-Sens was the best performing procedure 
in inferring directed-signed networks. Therefore, we 
concluded that if both time-course and KO data are 
available for a sub-set of genes of interest, MOO-Sgns 
may be the procedure of choice. 

Modelling biological systems with NIMOO 

In order to test the validity of NIMOO to model real 
biological systems, we have analysed two datasets 



generated in our own laboratory. The first was a repli- 
cated gene-expression-profiling time-course experiment 
representing a model of in vivo glioblastoma develop- 
ment. A sub-set of these data were modelled with the 
MOO-Sp procedure. The second dataset included a 
time course representing the transcriptional response of 
E. coli during acid adaptation and the expression profil- 
ing of a compendium of 26 mutants exposed to acid. 
Because of the availability of both time-course and 
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Figure 5 MOO with incomplete gene KO datasets. Boxplots representing the distribution of average AUC values for 50 gene undirected 
(panel A) and directed-signed (panel B) networks. Accuracy of GRN reconstruction is given for the SOO, MOO-Tcsqo/o and MOO-Scsqo/o procedures. 
p values are indicated in red when significant {a = 0.05). Borderline p values and indicated in black {a = 0.2). 



mutant steady-state data we applied the MOO-Sgns 
procedure. 

Modelling in vivo tumour development 

Our model identified a network organized around three 
main hubs (NFE2L2, ERBB2 and HSPBl) (Figure 7B). 
NFE2L2 (Nuclear factor E2 p45-related factor 2; com- 
monly called Nrf2) is a transcription factor that binds to 
the cis-regulatory, antioxidant response element and 
transcriptionally up-regulate an environmental stress 
response program [28]. Nrf2 is critical for protection 
against a wide range of inflammatory conditions, hyper- 
oxia, fibrosis, hepatotoxicity, carcinogenesis, neurode- 
generation, cardiovascular disease and aging [29]. 
Inactivation of Nrf2 in some cancers, promote tumori- 
genicity and resistance to an array of chemotherapeutic 
compounds [30] . The biological role of Nrf2 as a master 
regulator of a crucial response is fully reflected in our 
model that identifies Nrf2 as the most upstream net- 
work node with the highest number of connections. 
Note that without the application of the MOO metho- 
dology this network feature was not inferred (Figure 
7A). 

The other network hubs are also known important 
signalling factors. ERBB2 is a gene that encode for a 



Table 3 Accuracy of GRN inference with partial coverage 
gene KO datasets 



Type 


Size 


SOO 


MOO-Tcsoo/o 


MOO-Scsoo/o 


Undirected 


50 


0.64 


0.64 


0.67 


Directed-signed 


50 


0.17 


0.38 


0.44 



The table shows the average AUC values obtained for 50-gene networks, for 
undirected and directed-signed networks, with the IV100-Tc5o% and MOO- 
ScsQo/o procedures. 



member of the epidermal growth factor (EGF) receptor 
family of receptor tyrosine kinases. This protein has no 
ligand-binding domain but it does bind tightly to other 
ligand-bound EGF receptor family members enhancing 
kinase-mediated activation of downstream signaling 
pathways. HSPBlhas a cytoprotective function and sup- 
port of cell survival under stress conditions. This gene is 
also involved in the apoptotic signalling pathway and 
interacts with actin and intermediate filaments to pro- 
tect actin filaments from fragmentation. It also preserves 
the focal contacts fixed at the cell membrane. These 
observations support the hypothesis that Nrf2 sits high 
in the hierarchy of events leading to the development of 
a fully vascularized tumour. 

Reverse engineering an E coli acid response network 

Both single objective (SOO) and multiobjective (MOO) 
optimization methods were applied to investigate regula- 
tory networks representative of E. coli acid adaptation. 
In order to test the full potential of the NIMOO metho- 
dology, we used both time-course and gene-inactivation 
experiments. 

The networks identified using either the time course 
data (SOO procedure) or the combination of time 
course and gene KO profiles (MOO procedure) are 
represented in Figure 7C and 7F respectively. In order 
to validate the model, we have compared our results 
with the gene interactions known in literature or 
extracted from the REGULON DB database [31]. 

The SOO method identified a number of gene con- 
nections that were known to play a role in acid adapta- 
tion. These included the interaction between two of the 
glutamate-dependent acid-stress response genes gadW 
and gadX [32]. However, in this model the directions of 
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Figure 6 Combining MOO procedures. Boxplots representing the distribution of AUC values for 20, 35 and 50-gene networl<s obtained by tine 
application of ensemble approach combining correlation and ratio-based MOO procedures. Accuracy of GRN reconstruction for directed-signed 
networks is given for the MOO-Tr, MOO-Tc, MOO-Tens procedures (panels A, C and E) and for the MOO-Sr, MOO-Sc, MOO-Sens procedures (panels 
B, D and F). p values are indicated in red when significant {a = 0.05). Borderline p values and indicated in black {a = 0.2). p values are indicated 
in red when significant {a = 0.05). Borderline p values and indicated in black {a = 0.2). 



the gene interactions are mostly incorrect and not 
representative of the known E. coli acid response 
mechanisms. For example, the coding glutamate decar- 
boxylase gene gadB is unlikely to be involved in the 
modulation of the two-component system PhoP/PhoQ. 

On the contrary, the gene regulatory network derived 
from the application of the MOO procedure (Figure 7D) 
includes several gene interactions known to be impor- 
tant in acid adaptation, 

A key interaction involved the two-component system 
PhoP/PhoQ [33]. This complex is a known upstream 
regulator of acid adaptation. The model we developed 
(Figure 7C) reflects the upstream regulatory role of this 



complex and correctly predicted its influence in the reg- 
ulation of the acid resistance genes gadW and hdeA 
[34]. The network also shows the known negative inter- 
action between gadX and gadW [32] and the inhibition 
of the crp gene by fis [35,36]. Another validated interac- 
tion found by the MOO procedure is represented by the 
link between the histone-like protein hns and cadA [37]. 
Our model shows that hns may activate the expression 
of cadA. The connection is consistent with the litera- 
ture, however, in GNB7145K hns mutants Shi et al. [37] 
have shown that hns acts as a repressor. 

Some of the interactions in the network represent 
potentially novel regulatory mechanisms in E, coli 
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response to acid stress. An example is the hypothesis 
that phoP may be involved in the activation of narX, a 
nitrite/nitrate sensor protein. This is a gene that is part 
of a two-component system regulating a component of 
anaerobic metabolism, which is a function known to be 
regulated during acid response [38]. 

Discussion 

In this paper we presented the gene regulatory network 
inference method "Network Inference with Multi Objec- 
tive Optimization" (NIMOO). 

When tested on simulated and "real world" datasets, 
NIMOO performs well even if incomplete data are avail- 
able. The main feature of this methodology is that it can 
be used to develop dynamical models of gene regulatory 
networks integrating multiple data sources and combin- 
ing any existing network inference methodology to iden- 
tifying the network topology. 




Although other methods have the potential to include 
prior knowledge in the inference process the ability to 
plug-in different inference methods in the same model- 
ling procedure is so far a unique feature of NIMOO. In 
this paper we tested this concept and demonstrated that 
the approach can be successful even if a relatively sim- 
ple procedure is integrated in the ODE model parameter 
estimation. However, a more comprehensive testing may 
be required to explore the full potential of this 
approach, for example combining more advanced meth- 
ods in the MOO optimization procedure. 

In terms of data integration, we have mainly focused 
on gene KO experiments. However, some of the proce- 
dures we have tested (e.g. MOO-Tc and MOO-Sc) are 
directly applicable to other types of experimental data. 
For example, a compendium of environmental and 
growth factor-induced perturbations could be employed 
to develop an objective compatible with these 



Figure 7 Inference of of biologically relevant networks. Gene regulatory networks obtained from the glioblastoma (panels A, B) and E. coli 
acid stress datasets (panels C and D). Networks obtained from SOO (panels A and C) and MOO (panels B and D) procedures are shown. 
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procedures. Such objectives could be for example com- 
puted by using the information theoretical approach 
ARACNE [5]. 

Moreover, additional information, for instance the 
confidence level in transcription factor binding consen- 
sus sequences in a gene's promoter region could also be 
incorporated within the optimisation process. More gen- 
erally, in the event that multiple objectives are used 
within a MOO procedure, each function's relative 
importance could be weighted by adjusting the optimi- 
zation parameters, such as weights 9^ (Equation 4). 
Additionally, any definite qualitative knowledge of the 
presence or absence of gene connections may be used 
as a constraint on the inferred gene-regulatory matrix 
(hard prior). 

Because of the ability to integrate different methods 
the user can very easily customize NIMOO. In this 
respect, NIMOO is an integration framework rather 
than a specific method. Comparing its performance 
with existing methods is therefore not necessarily con- 
sequential. However, we have performed an initial 
assessment comparing some implementations of 
NIMOO to other methods. For example, all NIMOO 
procedures outperformed TSNI [10] in inferring undir- 
ected networks (Table SI in Additional File 1) and the 
MOO-Sens and MOO-Tens performed better with both 
undirected and direct-signed networks (Table SI in 
Additional File 1). Moreover, NIMOO performed in a 
comparable manner to the method developed by Yip et 
al. [15], which won the DREAMS competition http:// 
wiki.c2b2.columbia.edu/dream/index.php/ (Table S2 in 
Additional File 1). 

So far the application of multi-objective optimization 
methods to inferring biological networks has been lim- 
ited: For example, van Someren et al. [39] and Fome- 
kong-Nanfack et al. [40] used MOO to incorporate 
multiple constraints arising from the requirement of sta- 
bility and robustness of gene networks, and, Liu and 
Wang [41] have used MOO to infer biochemical net- 
works by simultaneously minimizing for the concentra- 
tion error and the slope error. However, in all these 
cases a single data set and a single reverse engineering 
criterion were used. 

Conclusions 

The network-inference framework NIMOO is flexible 
and can be used in many different scenarios, even when 
available information is incomplete. The application of 
NIMOO to biological datasets representing two different 
"real world" scenarios produced very interesting results. 
The analysis of the experimental datasets illustrated that 
inclusion of additional objectives from the same dataset 
could significantly improve our ability to identify key 
regulators of relevant biological processes. 



Additional material 



Additional File 1: A method comparison study and additional tables 
and figures as detailed in the body of the paper. 
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